Skip to content

PERF faster head, tail and size groupby methods #5533

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Nov 20, 2013

Conversation

hayd
Copy link
Contributor

@hayd hayd commented Nov 17, 2013

Try again with #5518.

Massive gains in groupby head and tail, adds more tests for these, slight speed improvement in size (not as much as I'd hoped, basically iterating through grouper.indices is slow :( ).

As mentioned before, I added a helper function to prepend as_index to index (if that makes any sense), I think it could be faster, and also using it in apply may fix some bugs there...

@hayd
Copy link
Contributor Author

hayd commented Nov 18, 2013

@jreback I fixed this up. A slight api change is that head now respects original frame order (it doesn't if you do an apply).

@jreback
Copy link
Contributor

jreback commented Nov 18, 2013

can you add some tests where the groups vary in size and where the number you are asking for (e.g. head(3)) is > than the number in some/all groups.

@hayd
Copy link
Contributor Author

hayd commented Nov 18, 2013

Added tests for <=0 and > max group size. One already for between group sizes.

(though it's a small example, I think it covers...)

'''
Returns first n rows of each group.

Essentially equivalent to .apply(lambda x: x.head(n))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you double backquote the code (.apply(..)) so it is rendered as code?

@jorisvandenbossche
Copy link
Member

Added some comments to the docstrings.

I have another question regarding the docstring style:

  • You changed the """ to ``'''. Is this recommended, or is there some 'policy' on this in pandas? Because I always use """` (and this is also used in PEP8: http://www.python.org/dev/peps/pep-0008/#documentation-strings).
    It doesn't matter at all for me what we use, but maybe we should try to be consistent, and I had the impression that `"""` is used the most in the pandas source code

@hayd
Copy link
Contributor Author

hayd commented Nov 19, 2013

@jorisvandenbossche Thanks for comments, will update.

@hayd
Copy link
Contributor Author

hayd commented Nov 19, 2013

Fixed these. Mentioned the ascending arg to cumcount, which I've purposely made kwarg only for now.

Noticed a weird related thing with nth on a DataFrame (it just doesn't work, and is kinda undefined), will make sep issue though. #5552

@@ -474,6 +473,10 @@ def ohlc(self):
return self._cython_agg_general('ohlc')

def nth(self, n):
"""
Return the nth row of each group
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thing I noticed makes this a complete lie. Ooops, will delete then merge.

hayd added a commit that referenced this pull request Nov 20, 2013
PERF faster head, tail and size groupby methods
@hayd hayd merged commit e5e53ba into pandas-dev:master Nov 20, 2013
@hayd hayd deleted the groupby_head_tail branch November 20, 2013 01:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants